Usages of an external duration model for HMM-based speech synthesis

نویسندگان

  • Javier Latorre
  • Sabine Buchholz
  • Masami Akamine
چکیده

In this paper we analyze three different approaches to improving the quality of an HMM-based speech synthesizer by means of an external duration model. The first approach uses the external duration model in a standard way to define the phone duration during synthesis. The second is a novel approach that uses the phone duration to create additional context features for the decision trees clustering. The third is a combination of the previous two approaches. A subjective evaluation showed a quality improvement with respect to the baseline for all three approaches, although for differing reasons. The standard approach produces an improvement in the duration estimation. The second approach degrades the duration estimation but improves the logF0 and aperiodicity by better modeling of their dependencies with respect to the duration. Finally, the combined approach benefits from the improvements of the other two and yields the best result of ca. 16% higher preference than the baseline among native English speakers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech enhancement based on hidden Markov model using sparse code shrinkage

This paper presents a new hidden Markov model-based (HMM-based) speech enhancement framework based on the independent component analysis (ICA). We propose analytical procedures for training clean speech and noise models by the Baum re-estimation algorithm and present a Maximum a posterior (MAP) estimator based on Laplace-Gaussian (for clean speech and noise respectively) combination in the HMM ...

متن کامل

Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...

متن کامل

State-Correlated Duration Model for HMM-Based Speech Synthesis System1

This paper proposes a State-Correlated Duration model for HMM-based speech synthesis system. It uses an improved forward-backward algorithm to estimate the state-duration transition probability between the neighboring states. In the synthesis part, we determine the state duration taking account of the state-duration transition probability. Experiment results show that the speech we synthesized ...

متن کامل

Explicit duration modelling in HMM-based speech synthesis using a hybrid hidden Markov model-multilayer perceptron

In HMM-based speech synthesis, it is important to correctly model duration because it has a significant effect on the perceptual quality of speech, such as rhythm. For this reason, hidden semi-Markov model (HSMM) is commonly used to explicitly model duration instead of using the implicit state duration model of HMM through its transition probabilities. The cost of using HSMM to improve duration...

متن کامل

Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis

In this paper, we describe an HMM-based speech synthesis system in which spectrum, pitch and state duration are modeled simultaneously in a unified framework of HMM. In the system, pitch and state duration are modeled by multi-space probability distribution HMMs and multi-dimensional Gaussian distributions, respectively. The distributions for spectral parameter, pitch parameter and the state du...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009